This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Ctrl+Shift+Enter. # First part ## data filter and normalization
source("./tianfengRwrappers.R")
载入需要的程辑包:dplyr
载入程辑包:‘dplyr’
The following object is masked from ‘package:matrixStats’:
count
The following object is masked from ‘package:Biobase’:
combine
The following objects are masked from ‘package:GenomicRanges’:
intersect, setdiff, union
The following object is masked from ‘package:GenomeInfoDb’:
intersect
The following objects are masked from ‘package:IRanges’:
collapse, desc, intersect, setdiff, slice, union
The following objects are masked from ‘package:S4Vectors’:
first, intersect, rename, setdiff, setequal, union
The following objects are masked from ‘package:BiocGenerics’:
combine, intersect, setdiff, union
The following objects are masked from ‘package:stats’:
filter, lag
The following objects are masked from ‘package:base’:
intersect, setdiff, setequal, union
载入需要的程辑包:reticulate
载入需要的程辑包:tidyr
载入程辑包:‘tidyr’
The following object is masked from ‘package:S4Vectors’:
expand
载入程辑包:‘MySeuratWrappers’
The following objects are masked from ‘package:Seurat’:
DimPlot, DoHeatmap, LabelClusters, RidgePlot, VlnPlot
载入程辑包:‘cowplot’
The following object is masked from ‘package:ggpubr’:
get_legend
载入需要的程辑包:viridisLite
载入程辑包:‘reshape2’
The following object is masked from ‘package:tidyr’:
smiths
NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
Registered S3 method overwritten by 'enrichplot':
method from
fortify.enrichResult DOSE
clusterProfiler v3.14.3 For help: https://guangchuangyu.github.io/software/clusterProfiler
If you use clusterProfiler in published research, please cite:
Guangchuang Yu, Li-Gen Wang, Yanyan Han, Qing-Yu He. clusterProfiler: an R package for comparing biological themes among gene clusters. OMICS: A Journal of Integrative Biology. 2012, 16(5):284-287.
载入程辑包:‘clusterProfiler’
The following object is masked from ‘package:DelayedArray’:
simplify
Registering fonts with R
载入程辑包:‘plotly’
The following object is masked from ‘package:ggplot2’:
last_plot
The following object is masked from ‘package:IRanges’:
slice
The following object is masked from ‘package:S4Vectors’:
rename
The following object is masked from ‘package:stats’:
filter
The following object is masked from ‘package:graphics’:
layout
载入需要的程辑包:e1071
载入程辑包:‘widgetTools’
The following object is masked from ‘package:dplyr’:
funs
载入程辑包:‘DynDoc’
The following object is masked from ‘package:DelayedArray’:
path
The following object is masked from ‘package:BiocGenerics’:
path
载入程辑包:‘DT’
The following object is masked from ‘package:Seurat’:
JS
========================================
circlize version 0.4.13
CRAN page: https://cran.r-project.org/package=circlize
Github page: https://github.com/jokergoo/circlize
Documentation: https://jokergoo.github.io/circlize_book/book/
If you use it in published research, please cite:
Gu, Z. circlize implements and enhances circular visualization
in R. Bioinformatics 2014.
This message can be suppressed by:
suppressPackageStartupMessages(library(circlize))
========================================
载入需要的程辑包:grid
========================================
ComplexHeatmap version 2.2.0
Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
Github page: https://github.com/jokergoo/ComplexHeatmap
Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
If you use it in published research, please cite:
Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional
genomic data. Bioinformatics 2016.
========================================
载入程辑包:‘ComplexHeatmap’
The following object is masked from ‘package:plotly’:
add_heatmap
human_coronary_countmatrix <- read.csv("GSE131778_human_coronary_scRNAseq.txt", sep = "\t")
func <- function(s) {
paste0(strsplit(s, ".", fixed = T)[[1]][2], "_", strsplit(s, ".", fixed = T)[[1]][1])
}
colnames(human_coronary_countmatrix) <- lapply(colnames(human_coronary_countmatrix), func) # 拆分样本
human_coronary <- CreateSeuratObject(counts = human_coronary_countmatrix,
project = "human_coronary", min.cells = 10, min.features = 300) %>%
PercentageFeatureSet(pattern = "^MT-", col.name = "percent.mt") %>%
subset(subset = nFeature_RNA > 600 & nFeature_RNA < 6000 & nCount_RNA > 1000 & nCount_RNA < 30000) %>%
SCTransform(vars.to.regress = "percent.mt", verbose = F) %>%
RunPCA() %>% FindNeighbors(dims = 1:20) %>%
RunUMAP(dims = 1:20) %>%
FindClusters(resolution = 0.1)
rm(human_coronary_countmatrix)
# 批量读取计数矩阵
# 需要把行名的gene删掉,用vscode修改
count_mats <- list.files("./CA_GSE155512")
count_mats <- count_mats[count_mats != "sampleinfo.txt"]
allList <- lapply(count_mats, function(folder) {
CreateSeuratObject(
counts = read.csv(paste0("./CA_GSE155512/", folder), sep = "\t"),
project = folder, min.cells = 10, min.features = 300
)
})
# 合并seurat对象
CA_dataset1 <- merge(allList[[1]],
y = allList[-1], add.cell.ids = count_mats,
project = "CA_dataset1"
)
rm(allList)
CA_dataset1 <- PercentageFeatureSet(CA_dataset1, pattern = "^MT-", col.name = "percent.mt") %>%
subset(subset = nFeature_RNA > 600 & nFeature_RNA < 6000 & nCount_RNA > 1000 & nCount_RNA < 30000) %>%
SCTransform(vars.to.regress = "percent.mt", verbose = F) %>%
RunPCA() %>% FindNeighbors(dims = 1:20) %>%
RunUMAP(dims = 1:20) %>%
FindClusters(resolution = 0.1)
CA_dataset2 <- CreateSeuratObject(Read10X("./CA_GSE159677/"), names.field = 2, names.delim = "-",
project = "CA_dataset2", min.cells = 10, min.features = 300) %>%
PercentageFeatureSet(pattern = "^MT-", col.name = "percent.mt") %>%
subset(subset = nFeature_RNA > 600 & nFeature_RNA < 6000 & nCount_RNA > 1000 & nCount_RNA < 30000) %>%
SCTransform(vars.to.regress = "percent.mt", verbose = F) %>%
RunPCA() %>% FindNeighbors(dims = 1:20) %>%
RunUMAP(dims = 1:20) %>%
FindClusters(resolution = 0.1)
Idents(human_coronary) <- human_coronary$orig.ident
Idents(human_coronary) <- c("1","1","2","2","3","3","4","4")
human_coronary$samples <- Idents(human_coronary)
Idents(human_coronary) <- human_coronary$seurat_clusters
Idents(CA_dataset2) <- CA_dataset2$orig.ident
CA_dataset2 <- RenameIdents(CA_dataset2,'1' = 'AC_1','2' = 'PA_1','3' = 'AC_2','4' = 'PA_2','5' = 'AC_3','6' = 'PA_3')
UMAPPlot(CA_dataset2)
CA_dataset2$sample <- Idents(CA_dataset2)
CA_dataset2 <- RenameIdents(CA_dataset2,'AC_1' = 'AC','PA_1' = 'PA','AC_2'= 'AC','PA_2'= 'PA','AC_3'= 'AC','PA_3'= 'PA')
CA_dataset2$conditions <- Idents(CA_dataset2)
Idents(CA_dataset2) <- CA_dataset2$orig.ident
CA_dataset2 <- RenameIdents(CA_dataset2, '1' = 'sp_1','2' = 'sp_1','3' = 'sp_2','4' = 'sp_2','5' = 'sp_3','6' = 'sp_3')
CA_dataset2$groups <- Idents(CA_dataset2)
Idents(CA_dataset2) <- CA_dataset2$seurat_clusters
saveRDS(human_coronary,"human_coronary.rds")
saveRDS(CA_dataset1,"CA_dataset1.rds")
saveRDS(CA_dataset2,"CA_dataset2.rds") #已经经过分组处理了
Idents(CA_dataset2) <- CA_dataset2$conditions
AC <- subset(CA_dataset2, idents = "AC")
PA <- subset(CA_dataset2, idents = "PA")
ds2_PA <- ds2_PA %>% RunPCA() %>% FindNeighbors(dims = 1:20) %>%
RunUMAP(dims = 1:20, seed.use = 20) %>%
FindClusters(resolution = 0.1)
PC_ 1
Positive: CCL4, HLA-DRA, CD74, HLA-DRB1, HLA-DPA1, CCL5, SPP1, CCL4L2, LYZ, FTL
CCL3, FTH1, B2M, HLA-DPB1, C1QB, C1QA, TYROBP, FABP5, CD69, RNASE1
C1QC, HLA-DQA1, ACKR1, NKG7, CXCR4, AREG, HLA-DRB5, IL7R, SRGN, GZMK
Negative: TAGLN, ACTA2, MGP, MYL9, IGFBP7, TPM2, CALD1, RGS5, MYH11, TPM1
C11orf96, BGN, SPARCL1, CTGF, LUM, DSTN, PPP1R14A, IGFBP2, OGN, PLN
ADIRF, FHL1, COL14A1, SOD3, NOV, LMOD1, CYR61, MYH10, AEBP1, COL1A2
PC_ 2
Positive: ACTA2, MYL9, TAGLN, TPM2, MYH11, RGS5, PLN, C11orf96, CALD1, PPP1R14A
LMOD1, DSTN, MFAP4, CNN1, RAMP1, MYH10, ITGA8, C12orf75, TPM1, CSRP2
FILIP1L, ACTG2, CSRP1, SYNPO2, NEXN, FLNA, MYLK, MCAM, PPP1R12B, ACTC1
Negative: MGP, LUM, CFH, TIMP1, COL1A2, COL3A1, BGN, DCN, SFRP2, COL1A1
IGFBP7, CTGF, APOE, POSTN, FN1, AEBP1, CRTAC1, VCAN, C1R, NDUFA4L2
PCOLCE, COL6A3, THY1, CYR61, CCDC80, TNC, CCL19, GAP43, IGFBP3, PRSS23
PC_ 3
Positive: COL3A1, LUM, COL1A1, SPARCL1, DCN, THY1, COL6A3, COL1A2, MYH11, RGS5
POSTN, C11orf96, IBSP, APOE, TIMP3, TNC, PLN, SERPINF1, CALD1, COL6A1
FN1, CCL19, CPE, TIMP1, PCOLCE, COL6A2, MMP11, SFRP2, LMOD1, SPARC
Negative: MGP, CTGF, IGFBP2, IGFBP7, TNFRSF11B, FRZB, EFEMP1, OGN, CLU, NOV
IGFBP6, ACTC1, SUCNR1, ASPN, SOST, SFRP5, CRYAB, IGFBP3, COL8A1, OMD
PTN, DLX6-AS1, C2orf40, RAMP1, CYTL1, ID3, SGCG, GSN, BGN, PDE5A
PC_ 4
Positive: FN1, LUM, DCN, COL1A2, CRTAC1, COL3A1, MYH11, COL1A1, CLU, VCAN
MYH10, POSTN, IGFBP6, OGN, IBSP, TNFRSF11B, PRSS23, NOV, AEBP1, SLPI
SFRP4, PLN, LTBP1, MGP, COL8A1, HTRA1, DPT, PPP1R14A, ASPN, FAP
Negative: APOE, CCL19, IGFBP7, RGS5, NDUFA4L2, STEAP4, CCL21, CCDC102B, IGFBP5, C2orf40
COX4I2, FRZB, C7, RGS16, AGT, PGF, ID4, CYR61, PLAC9, SEPT4
HIGD1B, RARRES2, TDO2, LHFPL6, SOD3, NR2F2, ANGPT2, GGT5, THY1, MT2A
PC_ 5
Positive: SFRP2, C2orf40, TAGLN, CTGF, CYR61, CCL19, DCN, DPT, FRZB, IGFBP2
IGFBP7, PTGDS, RARRES2, CP, FMO2, FGF7, ACTC1, COL6A3, SERPINF1, CXCL12
TIMP3, THY1, TNC, COL1A1, SLPI, SFRP4, LUM, SOD3, CCL21, RAMP1
Negative: APOE, SPARCL1, MGP, TFPI2, RGS5, CFH, NDUFA4L2, GSN, MYH11, AGT
IGFBP3, PCOLCE2, SULF1, OGN, COL18A1, VCAN, ADH1B, DKK3, COL6A2, ASPN
APCDD1, COLEC11, BGN, DIO2, CRLF1, LTBP1, CRHBP, LHFPL6, EFEMP1, COL14A1
Computing nearest neighbor graph
Computing SNN
16:41:40 UMAP embedding parameters a = 0.9922 b = 1.112
16:41:40 Read 6498 rows and found 20 numeric columns
16:41:40 Using Annoy for neighbor search, n_neighbors = 30
16:41:40 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:41:41 Writing NN index file to temp file /tmp/Rtmpptkzpe/file3f09af16ba45
16:41:41 Searching Annoy index using 1 thread, search_k = 3000
16:41:42 Annoy recall = 100%
16:41:43 Commencing smooth kNN distance calibration using 1 thread
16:41:44 Initializing from normalized Laplacian + noise
16:41:44 Commencing optimization for 500 epochs, with 273294 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:41:50 Optimization finished
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck
Number of nodes: 6498
Number of edges: 224605
Running Louvain algorithm...
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.9344
Number of communities: 4
Elapsed time: 0 seconds
umapplot(ds2_AC, group.by = "Classification1")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
umapplot(ds2_PA, group.by = "Classification1")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
saveRDS(ds2_AC,"ds2_AC.rds")
saveRDS(ds2_PA,"ds2_PA.rds")